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Abstract. We examine various methods for combining the output of 
one-class models. In particular, we show that simple meta-learning based 
ensemble achieves better result than weighting methods. Furthermore we 
propose a new one-class ensemble scheme, called TUPSO that uses meta- 
learning for combining multiple one-class classifiers. We also present a new 
one-class classification performance measures to weigh the base- classifiers, 
a process that proved helpful for increasing the classification performance 
of the induced ensemble. Our experimental study shows that the proposed 
method significantly outperforms exiting methods. 

1 Introduction and Background 

One-class classification aims to differentiate instances of class of interest from all 
other instances. The one-class classifier is trained from a training set containing 
only the instances of that class. Many one-class classification algorithms have 
been investigated [TJ |2] . The variety of classification algorithms is both positive 
and negative. On the one hand, there are plenty of techniques to choose from, 
but on the other hand, choosing the right one can be difficult. This is because 
evaluating one-class classifiers performance is problematic. By definition, the 
data collections contain only one-class examples, and thus performance metrics, 
such as false-positive (FP) and true negative (TN), cannot be computed. In 
the absence of FP and T7V, derived performance metrics, such as classification 
accuracy, precision, AUC and others cannot be computed. One way to address 
this difficulty is to use an ensemble of classifiers as opposed to choosing the 
best classifier. The main idea behind the ensemble methodology is to weigh 
several individual classifiers and combine them in order to obtain a classifier that 
outperforms them all. Indeed, previous works in supervised ensemble learning 
show that combining classification models can produce a better classifier in terms 
of prediction accuracy [3] . 

Compared to supervised-learning, research in the one-class ensemble research 
is limited [4]. In particular, only one combining method, the Fix-rule ensemble, 
was considered for one-class ensemble [5j[6j[7]. I n this method, the combiner re- 
gards each participating classifier's output as a single vote, upon which it applies 
some aggregative function (a combining rule) to produce a final classification. 
Henceforth we use the notation /(•) as the indicator function, P^ (x\uot c ) as the 
estimated probability of instance x given the target class uotc, fr,k as the frac- 
tion of the target class that should be accepted for classifier k and 0^ notates 



the classification threshold for classifier k. A list of fixed combining rules is 
presented in Table [I] 
The fixed rule ensemble techniques, however, are not optimal, as the rules are 



Combining Rule 


Combination Rule Formula 


Majority voting (Voting) 


y(x) = I> k/2 (Efc 1 ( p k (x\u Tc ) > Ok)) 


Mean vote 


y {x) = lY, k nPk{x\uj Tc )>e k ) 


Mean weighted vote 


V(x) = ± Efc [fT, k I (*\"Tc) > e k) + (! - fT,k)I (Pk M"Tc) > Sfc)] 


Average rule 


y(x) = lT, k p k{x\uj Tc ) 


Max rule 


y(x) = argmax k [P k (x\u Tc )] 


Product rule 


V(x) = U k [ p k (x\ujtc)} 


Exclusive voting rule 


y(x) = h (Efc 1 ( p k (x\ou Tc ) > e k )) 


Weighted votes product 


y{ ] ~ n k [fT, k np k (x\"Tc)>e k )]+n k [(i-fT, k )np k (x\"Tc)<o k )} 



Table 1: Fix combining rules. 



usually assigned statically and independent of the training data. In order to find 
better ensemble solutions, researchers tend to import ideas from other learning 
domains. Unfortunately, the absence of negative examples in the training-sets of 
one-class problems make the adoption of many fine ensemble schemes difficult, 
especially with those who rely on some performance metrics, e.g. Weighting 
Performance and Grading. The ensemble method we pursue here, TUPSO, is 
based on met a- learning, aided by one-class performance evaluator. We show 
this method is indeed practical and worthwhile for one-class ensemble learning. 
Moreover, it significantly outperforms the Fix-rule technique. 



2 Meta-Learning-Based Ensembles for One-Class Problems 

The proposed ensemble method, TUPSO, is intended to function in one-class 
scenarios where multiple, and possibly diverse, classifiers exist. Its main task is 
to combine one-class base-classifiers via meta-classifier. TUPSO is aided by a 
heuristic performance evaluator which estimates the classification performance 
of each of the base-classifiers and output a performance vector, Perf vect = 
{Prefix Pref m }. The estimated performance is then translated into static 
weights, c^, which the meta-learning algorithm uses, = Perfi*l/Y^jLi P er fj, 
\/i = l...m, where m is the number of instance. 

TUPSO ensemble, as shown in Figure [l] is made up of four major compo- 
nents: (1) Base-Classifiers, (2) Performance Evaluator, (3) Aggregate Features 
Extractor and (4) Meta Classifier. 

One of the building blocks of meta-learning is the meta-features, which are 
measured properties of the base-classifier (s). A collection of meta-features makes 
a meta-instance, which is used for training the meta-classifier and upon which 
the trained meta-classifier produces the ensemble prediction. The Aggregate 
Features Extractor produces meta-features by using multiple aggregations of the 



Instance 



Base Classifiers 

-r' =n — r 

Predictions | Classifierl Classifier 2 □ □ □ Classifier /c 

Predictions 

'\ Aggregate Feature 
f Extractor 

Meta-Level Features 



Classifiers' Performance 



Fig. 1: The TUPSO ensemble scheme. 



base-classifiers outputs. Let P m =< p mi , ...,p mfe > be the vector containing the 
base-classifiers outputs p mi , ...,p mfc , where fc is the number of base-classifiers. In 
Table 2 we define six such aggregate meta-features F aggr = {/i, During 
the training phase, the meta-features are applied for training the meta-classifler. 
In the on-line phase they are used by the meta-classifler that produces the final 
prediction. 



Feature Name 



Abbreviation 



Feature Definition 



Sum of Votes 

Sum of Predictions 

Sum of weighted Predictions 

Variance of Votes 

Variance of Predictions 

Variance of weighted Predictions 



SumV 

SumP 

SumWP 

VarV 

VarP 

VarWP 



fl (Pm) = E*=l 1 {p m .>0.5} ( P ™i) 

Si (Pm) = Yli=l Prrii 

S3 (Pm) = Ei=l OLi * Pm, 

S4(Pm) = Var(l {prni > Q 5} (P m< )) 

Ss(Pm) = Var(Pm) X 

SQ(Pm) = Var (a*Pm) 



Table 2: Aggregate features, produced by the Aggregate Features Extractor 
module 



3 Estimating the Performance of One-Class Classifiers 

The base-classifiers performance is a key factor for producing effective aggregate 
meta-features. Unfortunately, the inherent absence of negative examples makes 
the performance assessing difficult. This is because the two important values, 
the false positive (FP) and true negative (TN) rates, cannot be measured, and 
thus, calculating some performance metrics such as Accuracy, Precision and F- 
score becomes impossible. Notice that each of the following performance metrics 
misses one or more values: Accuracy^ tp+tn+'f'p+fn "> P rec ^ s ^ on — tp+fp 



and F -score = pq^|, where P is the Precision and R is the Recall. Alterna- 
tively, we propose using a one-class approach for estimating classifiers perfor- 
mance. A criterion proposed in [8 estimates the F-score by using classifier's 
prediction on positive instances. The criterion, Pr [Y=+i] ? snares some charac- 
ters with the F-Score; note that Pr[f(x) = +1\Y = +l]Pr[Y = +1] = Pr[Y = 
= +l]Pr[f(x) = 1) ^ Pr[/(x) = +1\Y = +l]/Pr[/(a?) = +1] = 
Pr[F = +l|/(x) = +l]/Pr[Y = +1] ^ Pr[/( ; )=+1J = ^fey 
By multiplying both sides by r, we get the desired measure, henceforth denoted 

2 

as Positive-Only F-score (Fix-rule): POF = p r [j(^)= +1 ] 

Notice that the recall r = Pr[/(x) = -\-l\y = +1] can be estimated from the 
classifier predictions on positive labeled examples, and Pr[f(x) = 1] can be esti- 
mated with Pr[f(x) = +1] = ra^ -1 ) * Y^Li f( x ) ~ f rom the classifier output 
on the validation set. 

4 Evaluation 

In this section we examine two aspects concerning TUPSO. First, we test the 
proposed aggregation functions under various datasets. Secondly, we test how 
well TUPSO performs in comparison to some common Fix-rule methods. In 
order to estimate the generalized classification performance of the mentioned 
ensemble schemes, a 5x2 cross-validation procedure was performed [9 . 

4.1 Experiment Setup 

The evaluation of TUPSO is consist of three dimensions: Datasets (12 popular 
datasets from the UCI collection [E]), combining method (TUPSO or fixe-rule 
method) and base-classifier performance metric (POF or none). We used six 
inducer setups (i.e. base-classifiers), induced by four algorithms: (i) ADIFA-HM 
and (ii) ADIFA-GM [Tp, (hi) GDE [H , H PGA [13], (v) OC-SVM 1 and (vi) 
OC-SVM2 pp. The base-classifiers properties were kept static during the entire 
evaluation. Afore mentioned base-classifiers were selected as they represent three 
prominent families of one-class classifiers: density (ADIFA) nearest-neighbor 
(GDE and PGA) and boundary (SVM). Table [3] present the setup parameters. 



Base Classifier 


Algorithm 


Parameters 




ADIFA-HM 


ADIFA 


Aggregation = HarmonicMean, Sensitivity = 


2% 


ADIFA-GM 


ADIFA 


Aggregation = GeometricMean, Sensitivity = 


1% 


OC-GDE 


OC-GDE 


n/a 




OC-PGA 


OC-PGA 


k = 3 (3 — nearestnighbor) , a = 0.01 




OC-SVMi 


OC-SVM 


kernel = linear, v — 0.05 




OC-SVM2 


OC-SVM 


kernel = polinomial, v = 0.05 





Table 3: Base- Classifiers setup parameters. Shown are the non-default parame- 
ters. 



For the Mean- Vote rule we used the classification cutoff Ok =0.75. 



4.2 Measured Metrics 



For evaluating the real performance of both individual classifiers and ensemble 
methods we used a two-class performance metric, rather than POF. In contrast 
to the training process, in which the absence of negative examples dictates the 
use of one-class evaluation metrics (i.e. POF), the dataset in the testing phase 
included instances of both classes, and thus allows the use two-class performance 
measure. Specifically, we used the Area under the ROC curve (AUC). 

4.3 Experimental Results 

In Table 4 we present the evaluation results. We use 'All', 'WF' and 'Non-WF' 
to denote all-, weighted- and non- weighted- aggregative features respectively (see 
Table |2|. Inside the parenthesis is the AUC rank of the corresponding ensemble 
method. 



Comb. 








TUPSO 










Fix-Rule 




Dataset 


All 


WF 


Non-WF 


SumP 


SumV 


SumWP 


VarP 


VarWP 


Voting 


Max 


MeanV 


Product 


Balance- Scale 


0.83(4) 


0.85(1) 


0.82(6) 


0.83(5) 


0.85(2) 


0.84(3) 


0.69(9) 


0.73(7) 


0.67(12) 


0.51(13) 


0.68(11) 


0.69(10) 


Ecoli 


0.91(7) 


0.92(4) 


0.91(5) 


0.94(3) 


0.94(2) 


0.94(1) 


0.79(10) 


0.44(13) 


0.91(6) 


0.78(11) 


0.84(9) 


0.9(8) 


Disease 


0.69(9) 


0.75(2) 


0.7(8) 


0.73(3) 


0.75(1) 


0.71(5) 


0.68(10) 


0.7(7) 


0.61(12) 


0.51(13) 


0.7(6) 


0.71(4) 


Ionosphere 


0.96(2) 


0.96(3) 


0.96(1) 


0.96(5) 


0.96(6) 


0.96(4) 


0.51(13) 


0.7(10) 


0.89(7) 


0.76(8) 


0.65(11) 


0.74(9) 


Letter 


0.97(2) 


0.97(1) 


0.97(3) 


0.97(5) 


0.96(6) 


0.97(4) 


0.88(9) 


0.93(7) 


0.82(11) 


0.5(13) 


0.78(12) 


0.9(8) 


M-feat 


0.96(5) 


0.96(6) 


0.96(4) 


0.97(2) 


0.97(3) 


0.97(1) 


0.78(10) 


0.62(13) 


0.91(7) 


0.63(12) 


0.85(9) 


0.85(8) 


Opti-Digits 


0.98(2) 


0.98(1) 


0.98(3) 


0.97(4) 


0.97(5) 


0.97(6) 


0.78(10) 


0.61(13) 


0.86(8) 


0.65(12) 


0.78(9) 


0.9(7) 


Page-blocks 


0.93(5) 


0.92(6) 


0.94(1) 


0.94(3) 


0.94(2) 


0.93(4) 


0.8(10) 


0.83(9) 


0.85(8) 


0.52(13) 


0.74(11) 


0.87(7) 


Splice 


0.96(3) 


0.96(2) 


0.95(4) 


0.95(5) 


0.94(8) 


0.95(6) 


0.95(7) 


0.96(1) 


0.56(12) 


0.5(13) 


0.77(11) 


0.87(10) 


Vote 


0.93(3) 


0.93(5) 


0.93(4) 


0.93(2) 


0.93(6) 


0.94(1) 


0.75(9) 


0.71(10) 


0.76(8) 


0.51(13) 


0.71(11) 


0.82(7) 


Breast- Cancer 


0.96(3) 


0.97(1) 


0.96(2) 


0.96(5) 


0.96(6) 


0.96(7) 


0.8(10) 


0.87(8) 


0.96(4) 


0.5(13) 


0.75(12) 


0.84(9) 


Zoo 


0.82(4) 


0.88(1) 


0.82(5) 


0.82(3) 


0.78(7) 


0.83(2) 


0.69(10) 


0.78(9) 


0.67(11) 


0.5(13) 


0.78(8) 


0.8(6) 


Avg. Rank 


4.1 


2.8 


3.8 


3.8 


4.5 


3.7 


9.8 


8.9 


8.8 


12.3 


10 


7.8 



Table 4: TUPSO vs. Fix-Rule AUC result table. 



We can see that the weighted features (SumWP and VarWP) together 
produced the best performance. It seems that mixing together the non- weighted 
and the weighted features produces a marginally weaker ensemble. 
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Fig. 2: ROC Curves of TUPSO and Fix- Fig. 3: Density graph for the SumV ag- 
rule on Opt- digits dataset gregate feature for Letter dataset. 

In Figure 2 we present the ROC graphs of the tested ensemble methods 
generated using the Optical-Digits dataset. The best curve belongs to a TUPSO 
version that uses all the aggregate features defined in Table 2. A weakness of 
the Fix-rule technique is demonstrated in Figure 3. Having 60 base-classifiers, 



we plotted the density of the sum of votes for both the 'normal' and 'anomaly' 
classes. The classification cutoff value which produces the least classification 
errors is around 55. Voting, however, will classify any sum of votes above 30 as 
'normal' and since most of 'anomaly' instances receive more than 30 votes, using 
it will result in a high FNR and a low TPR. The Max-rule will perform equally 
poorly because the probability that no base-classifier will vote "normal' is very 
low for 'anomaly' instances. TUPSO, in contrast, will assign a low probability 
for sum of votes lower than 55 and hence classify instances more proficiently. 

5 Conclusions and Future Work 

In this paper we proposed a new meta-learning based ensemble scheme for one- 
class problems. The ensemble scheme learns a combining function upon aggre- 
gates of the base-classifiers' predictions. To improve the aggregates inductive 
power, we implemented a classification performance evaluator, which we found 
very effective. Further on, we would like to examine TUPSO on more datasets 
and investigate additional one-class performance evaluators. Finally, we would 
like to discover how TUPSO performs compared to the ensemble's best-classifier. 
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